Capstone Project at General Assembly:
If you allow a few paragraphs, I would like to share a few thoughts on how my capstone reached this state of equilibrium you are seeing today.
Background.
This project is the de facto "culmination" of just over 3 months
of coursework at General Assembly.
As a digital native generally familiar with what data is, but having fairly pedestrian competency (in coding, structuring, manipulating, and generating insights), I figured it would be fun to build something that hopefully is useful for fellow learners to accelerate their learning.
Ideally:
Seeking something reasonably challenging (from a learning rather than data acquisition perspective) with practical real world applications, I settled on CV (computer vision).
In learning and execution, I realized there are many code references one can take to fulfill project requirements. However, there are also many hidden challenges, from troubleshooting GPU usage to deprecated references. For those relatively new to the field, it is worthwhile to "hack" your way to success. Do set time limits so you don't get sucked into "black holes"!
I hope the below offers reasonable breadth on the topic and accelerates your learning of computer vision. Feel free to share your feedback!
For this exercise, I was not overly fussed about getting metrics / scoring in good order.
With computer vision (CV), it is fairly easy to tell if the image is classified correctly or not, or whether objects were detected.
From research, some common metrics used are listed below for CV cases:
# pip install pandas
# pip install opencv-python
# pip install matplotlib
# pip install seaborn
import tensorflow as tf
import os, warnings
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import glob
plt.style.use('seaborn')
from PIL import Image
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow import keras
from tensorflow.keras.applications import VGG16
print(tf.__version__)
2.6.0
# GPU check; to use your GPUs, tf-gpu should be installed
# access jupyter notebook from tf-gpu session
# in anaconda prompt: conda activate tf-gpu
print("Num GPUs Available: ",
len(tf.config.list_physical_devices('GPU')))
Num GPUs Available: 1
# # Reference: https://www.tensorflow.org/guide/gpu#setup
# tf.debugging.set_log_device_placement(True)
# # Create some tensors (place on CPU)
# with tf.device('/CPU:0'):
# a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
# b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
# # Run on GPU
# c = tf.matmul(a, b)
# print(c)
# gpus = tf.config.list_physical_devices('GPU')
# if gpus:
# # Restrict TensorFlow to only use the first GPU
# try:
# tf.config.set_visible_devices(gpus[0], 'GPU')
# logical_gpus = tf.config.list_logical_devices('GPU')
# print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
# except RuntimeError as e:
# # Visible devices must be set before GPUs have been initialized
# print(e)
Acquisition: Directly downloaded from source (Kaggle CelebFaces Attributes (CelebA) Dataset)
(Note: The full dataset will not be uploaded onto GitHub. Please download it separately if needed.)
Info from source:
imgalignceleba.zip: All the face images, cropped and alignedlistevalpartition.csv: Recommended partitioning of images into training, validation, testing sets. Images 1-162770 are training, 162771-182637 are validation, 182638-202599 are testinglistbboxceleba.csv: Bounding box information for each image. "x1" and "y1" represent the upper left point coordinate of bounding box. "width" and "height" represent the width and height of bounding boxlistlandmarksalign_celeba.csv: Image landmarks and their respective coordinates. There are 5 landmarks: left eye, right eye, nose, left mouth, right mouthlistattrceleba.csv: Attribute labels for each image. There are 40 attributes. "1" represents positive while "-1" represents negative# read in dataset
celeb_folder_path = './kaggle_celeb_images/'
dataset = tf.keras.preprocessing.image_dataset_from_directory(
directory=celeb_folder_path
)
Found 202600 files belonging to 1 classes.
dataset.class_names
['img_align_celeba']
dataset.take(1)
<TakeDataset shapes: ((None, 256, 256, 3), (None,)), types: (tf.float32, tf.int32)>
# BGR to RGB function
def convert_rgb(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
Sample images from "CelebA" dataset
for images, labels in dataset.take(1):
for img in range(9):
ax = plt.subplot(3, 3, img + 1)
plt.imshow(images[img].numpy().astype('uint8'))
plt.title(f'class {int(labels[img])}, {images[img].shape}')
plt.axis("off")
Images in dataset conform to only one class (celebrity) and are of shape (256, 256, 3).
df_celeb_attributes = pd.read_csv('./kaggle_celeb_images/list_attr_celeba.csv')
df_celeb_attributes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 202599 entries, 0 to 202598 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 image_id 202599 non-null object 1 5_o_Clock_Shadow 202599 non-null int64 2 Arched_Eyebrows 202599 non-null int64 3 Attractive 202599 non-null int64 4 Bags_Under_Eyes 202599 non-null int64 5 Bald 202599 non-null int64 6 Bangs 202599 non-null int64 7 Big_Lips 202599 non-null int64 8 Big_Nose 202599 non-null int64 9 Black_Hair 202599 non-null int64 10 Blond_Hair 202599 non-null int64 11 Blurry 202599 non-null int64 12 Brown_Hair 202599 non-null int64 13 Bushy_Eyebrows 202599 non-null int64 14 Chubby 202599 non-null int64 15 Double_Chin 202599 non-null int64 16 Eyeglasses 202599 non-null int64 17 Goatee 202599 non-null int64 18 Gray_Hair 202599 non-null int64 19 Heavy_Makeup 202599 non-null int64 20 High_Cheekbones 202599 non-null int64 21 Male 202599 non-null int64 22 Mouth_Slightly_Open 202599 non-null int64 23 Mustache 202599 non-null int64 24 Narrow_Eyes 202599 non-null int64 25 No_Beard 202599 non-null int64 26 Oval_Face 202599 non-null int64 27 Pale_Skin 202599 non-null int64 28 Pointy_Nose 202599 non-null int64 29 Receding_Hairline 202599 non-null int64 30 Rosy_Cheeks 202599 non-null int64 31 Sideburns 202599 non-null int64 32 Smiling 202599 non-null int64 33 Straight_Hair 202599 non-null int64 34 Wavy_Hair 202599 non-null int64 35 Wearing_Earrings 202599 non-null int64 36 Wearing_Hat 202599 non-null int64 37 Wearing_Lipstick 202599 non-null int64 38 Wearing_Necklace 202599 non-null int64 39 Wearing_Necktie 202599 non-null int64 40 Young 202599 non-null int64 dtypes: int64(40), object(1) memory usage: 63.4+ MB
df_celeb_attributes.head(3).T
| 0 | 1 | 2 | |
|---|---|---|---|
| image_id | 000001.jpg | 000002.jpg | 000003.jpg |
| 5_o_Clock_Shadow | -1 | -1 | -1 |
| Arched_Eyebrows | 1 | -1 | -1 |
| Attractive | 1 | -1 | -1 |
| Bags_Under_Eyes | -1 | 1 | -1 |
| Bald | -1 | -1 | -1 |
| Bangs | -1 | -1 | -1 |
| Big_Lips | -1 | -1 | 1 |
| Big_Nose | -1 | 1 | -1 |
| Black_Hair | -1 | -1 | -1 |
| Blond_Hair | -1 | -1 | -1 |
| Blurry | -1 | -1 | 1 |
| Brown_Hair | 1 | 1 | -1 |
| Bushy_Eyebrows | -1 | -1 | -1 |
| Chubby | -1 | -1 | -1 |
| Double_Chin | -1 | -1 | -1 |
| Eyeglasses | -1 | -1 | -1 |
| Goatee | -1 | -1 | -1 |
| Gray_Hair | -1 | -1 | -1 |
| Heavy_Makeup | 1 | -1 | -1 |
| High_Cheekbones | 1 | 1 | -1 |
| Male | -1 | -1 | 1 |
| Mouth_Slightly_Open | 1 | 1 | -1 |
| Mustache | -1 | -1 | -1 |
| Narrow_Eyes | -1 | -1 | 1 |
| No_Beard | 1 | 1 | 1 |
| Oval_Face | -1 | -1 | -1 |
| Pale_Skin | -1 | -1 | -1 |
| Pointy_Nose | 1 | -1 | 1 |
| Receding_Hairline | -1 | -1 | -1 |
| Rosy_Cheeks | -1 | -1 | -1 |
| Sideburns | -1 | -1 | -1 |
| Smiling | 1 | 1 | -1 |
| Straight_Hair | 1 | -1 | -1 |
| Wavy_Hair | -1 | -1 | 1 |
| Wearing_Earrings | 1 | -1 | -1 |
| Wearing_Hat | -1 | -1 | -1 |
| Wearing_Lipstick | 1 | -1 | -1 |
| Wearing_Necklace | -1 | -1 | -1 |
| Wearing_Necktie | -1 | -1 | -1 |
| Young | 1 | 1 | 1 |
df_celeb_attributes.describe()
| 5_o_Clock_Shadow | Arched_Eyebrows | Attractive | Bags_Under_Eyes | Bald | Bangs | Big_Lips | Big_Nose | Black_Hair | Blond_Hair | ... | Sideburns | Smiling | Straight_Hair | Wavy_Hair | Wearing_Earrings | Wearing_Hat | Wearing_Lipstick | Wearing_Necklace | Wearing_Necktie | Young | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 202599.000000 | 202599.000000 | 202599.00000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | ... | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.00000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 | 202599.000000 |
| mean | -0.777728 | -0.466039 | 0.02501 | -0.590857 | -0.955113 | -0.696849 | -0.518408 | -0.530935 | -0.521498 | -0.704016 | ... | -0.886979 | -0.035839 | -0.583196 | -0.360866 | -0.62215 | -0.903079 | -0.055129 | -0.754066 | -0.854570 | 0.547234 |
| std | 0.628602 | 0.884766 | 0.99969 | 0.806778 | 0.296241 | 0.717219 | 0.855135 | 0.847414 | 0.853255 | 0.710186 | ... | 0.461811 | 0.999360 | 0.812333 | 0.932620 | 0.78290 | 0.429475 | 0.998482 | 0.656800 | 0.519338 | 0.836982 |
| min | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | ... | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 |
| 25% | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | ... | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 1.000000 |
| 50% | -1.000000 | -1.000000 | 1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | ... | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | 1.000000 |
| 75% | -1.000000 | 1.000000 | 1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | ... | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.00000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 |
| max | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
8 rows × 40 columns
plt.figure(figsize=(16,10))
attr_corrs = df_celeb_attributes.drop(columns='image_id').corr()
sns.heatmap(attr_corrs,
cmap='bone', vmax=1, vmin=-1,
mask=np.triu(np.ones_like(attr_corrs, dtype=bool)));
plt.title('Celebrity Attributes Correlation Map', fontsize=18)
plt.savefig('./graphics/attribute_corr.png')
As will be seen from below, the bounding boxes table for CelebA dataset likely were with reference to their original image frames, using a centroid point with width and height data.
# read in data
df_celeb_bbox = pd.read_csv('./kaggle_celeb_images/list_bbox_celeba.csv')
# check for nulls
df_celeb_bbox.isnull().sum()
image_id 0 x_1 0 y_1 0 width 0 height 0 dtype: int64
# view data
df_celeb_bbox.head(3)
| image_id | x_1 | y_1 | width | height | |
|---|---|---|---|---|---|
| 0 | 000001.jpg | 95 | 71 | 226 | 313 |
| 1 | 000002.jpg | 72 | 94 | 221 | 306 |
| 2 | 000003.jpg | 216 | 59 | 91 | 126 |
# plot scatter for (x_1, y_1)
plt.scatter(df_celeb_bbox['x_1'], df_celeb_bbox['y_1'],
marker='x', alpha=0.1);
plt.title('bbox (x_1, y_1) scatter', fontsize=18)
plt.savefig('./graphics/bbox_xy_scatter.png')
# plot scatter for width and height
plt.scatter(df_celeb_bbox['width'], df_celeb_bbox['height'],
marker='x', alpha=0.1);
plt.title('bbox width, height scatter', fontsize=18)
plt.savefig('./graphics/bbox_widthheight_scatter.png')
df_celeb_eval_partition = pd.read_csv('./kaggle_celeb_images/list_eval_partition.csv')
plt.plot(df_celeb_eval_partition['partition']);
plt.title('eval_partition', fontsize=18)
plt.savefig('./graphics/eval_partition.png')
df_celeb_landmarks = pd.read_csv('./kaggle_celeb_images/list_landmarks_align_celeba.csv')
df_celeb_landmarks.head(3)
| image_id | lefteye_x | lefteye_y | righteye_x | righteye_y | nose_x | nose_y | leftmouth_x | leftmouth_y | rightmouth_x | rightmouth_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 000001.jpg | 69 | 109 | 106 | 113 | 77 | 142 | 73 | 152 | 108 | 154 |
| 1 | 000002.jpg | 69 | 110 | 107 | 112 | 81 | 135 | 70 | 151 | 108 | 153 |
| 2 | 000003.jpg | 76 | 112 | 104 | 106 | 108 | 128 | 74 | 156 | 98 | 158 |
df_celeb_landmarks.isnull().sum()
image_id 0 lefteye_x 0 lefteye_y 0 righteye_x 0 righteye_y 0 nose_x 0 nose_y 0 leftmouth_x 0 leftmouth_y 0 rightmouth_x 0 rightmouth_y 0 dtype: int64
# check if landmarks are bounded within (256, 256) frame
df_celeb_landmarks.describe() < 256
| lefteye_x | lefteye_y | righteye_x | righteye_y | nose_x | nose_y | leftmouth_x | leftmouth_y | rightmouth_x | rightmouth_y | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | False | False | False | False | False | False | False | False | False | False |
| mean | True | True | True | True | True | True | True | True | True | True |
| std | True | True | True | True | True | True | True | True | True | True |
| min | True | True | True | True | True | True | True | True | True | True |
| 25% | True | True | True | True | True | True | True | True | True | True |
| 50% | True | True | True | True | True | True | True | True | True | True |
| 75% | True | True | True | True | True | True | True | True | True | True |
| max | True | True | True | True | True | True | True | True | True | True |
# read in first image
drawing = convert_rgb(cv2.imread('./kaggle_celeb_images/img_align_celeba/img_align_celeba/000001.jpg'))
# draw in landmarks
# eyes
cv2.line(drawing,
pt1=(df_celeb_landmarks['lefteye_x'][0],
df_celeb_landmarks['lefteye_y'][0]),
pt2=(df_celeb_landmarks['righteye_x'][0],
df_celeb_landmarks['righteye_y'][0]),
color=(255, 255, 0), thickness=1)
eye_width = 25
eye_height = 15
cv2.rectangle(drawing,
pt1=(df_celeb_landmarks['lefteye_x'][0] - int(eye_width/2),
df_celeb_landmarks['lefteye_y'][0] - int(eye_height/2)),
pt2=(df_celeb_landmarks['lefteye_x'][0] + int(eye_width/2),
df_celeb_landmarks['lefteye_y'][0] + int(eye_height/2)),
color=(255, 0, 0), thickness=1)
cv2.rectangle(drawing,
pt1=(df_celeb_landmarks['righteye_x'][0] - int(eye_width/2),
df_celeb_landmarks['righteye_y'][0] - int(eye_height/2)),
pt2=(df_celeb_landmarks['righteye_x'][0] + int(eye_width/2),
df_celeb_landmarks['righteye_y'][0] + int(eye_height/2)),
color=(255, 0, 0), thickness=1)
# mouth
cv2.line(drawing,
pt1=(df_celeb_landmarks['leftmouth_x'][0],
df_celeb_landmarks['leftmouth_y'][0]),
pt2=(df_celeb_landmarks['rightmouth_x'][0],
df_celeb_landmarks['rightmouth_y'][0]),
color=(0, 255, 255), thickness=1)
mouth_height = 15
cv2.rectangle(drawing,
pt1=(df_celeb_landmarks['leftmouth_x'][0],
df_celeb_landmarks['leftmouth_y'][0]),
pt2=(df_celeb_landmarks['rightmouth_x'][0],
df_celeb_landmarks['rightmouth_y'][0] + mouth_height),
color=(0, 0, 255), thickness=1)
# only one (x, y) coordinate for nose
nose_width = 20
nose_height = 30
cv2.rectangle(drawing,
pt1=(df_celeb_landmarks['nose_x'][0] - int(nose_width/2),
df_celeb_landmarks['nose_y'][0] - int(nose_height * 0.8)),
pt2=(df_celeb_landmarks['nose_x'][0] + int(nose_width/2),
df_celeb_landmarks['nose_y'][0] + int(nose_height * 0.2)),
color=(0, 255, 0), thickness=1)
plt.axis('off')
plt.imshow(drawing);
plt.savefig('./graphics/face_with_boxes.png')
Landmarks data for each frontal face image indicate:
To keep things simple, we will pull only eye features, cropping them into a 32 x 32 frame.
for i in range(1,1001):
img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
eye_width = 32
eye_height = 32
x = df_celeb_landmarks['lefteye_x'][i - 1] - int(eye_width/2)
y = df_celeb_landmarks['lefteye_y'][i - 1] - int(eye_height/2)
crop_img = img[y : y + eye_height,
x : x + eye_width]
cv2.imwrite(f'./data/lefteye/lefteye_{str(i).zfill(6)}.jpg', crop_img)
display_count = 15
ncols_display = 5
for i in range(1, display_count + 1):
ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
ax.imshow(convert_rgb(cv2.imread(f'./data/lefteye/lefteye_{str(i).zfill(6)}.jpg')))
ax.set_title(f'{i}')
ax.axis('off')
plt.suptitle(f'Left eyes: First {display_count} images', fontsize=18);
plt.tight_layout(pad=2)
# plt.savefig('./graphics/left_eyes_15.png')

for i in range(1,1001):
img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
eye_width = 32
eye_height = 32
x = df_celeb_landmarks['righteye_x'][i - 1] - int(eye_width/2)
y = df_celeb_landmarks['righteye_y'][i - 1] - int(eye_height/2)
crop_img = img[y : y + eye_height,
x : x + eye_width]
cv2.imwrite(f'./data/righteye/righteye_{str(i).zfill(6)}.jpg', crop_img)
display_count = 15
ncols_display = 5
for i in range(1, display_count + 1):
ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
ax.imshow(convert_rgb(cv2.imread(f'./data/righteye/righteye_{str(i).zfill(6)}.jpg')))
ax.set_title(f'{i}')
ax.axis('off')
plt.suptitle(f'Right eyes: First {display_count} images', fontsize=18);
plt.tight_layout(pad=2)
# plt.savefig('./graphics/right_eyes_15.png')

for i in range(1001,1101):
img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
eye_width = 32
eye_height = 32
x = df_celeb_landmarks['lefteye_x'][i - 1] - int(eye_width/2)
y = df_celeb_landmarks['lefteye_y'][i - 1] - int(eye_height/2)
crop_img = img[y : y + eye_height,
x : x + eye_width]
cv2.imwrite(f'./test/lefteye_{str(i).zfill(6)}.jpg', crop_img)
for i in range(1001,1101):
img = cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')
eye_width = 32
eye_height = 32
x = df_celeb_landmarks['righteye_x'][i - 1] - int(eye_width/2)
y = df_celeb_landmarks['righteye_y'][i - 1] - int(eye_height/2)
crop_img = img[y : y + eye_height,
x : x + eye_width]
cv2.imwrite(f'./test/righteye_{str(i).zfill(6)}.jpg', crop_img)
Summary:
- Computing Haar-like features, which are used for detection, can hence be done in constant time, i.e. $O(1)$ (check out Big O notation). This means that time taken to compute does not depend on input size.
- This is important because Haar-like features per image are greater than number of pixels in the image.
- In the framework, features, rather than pixels, are used directly. Simple rectangle features are used.
- Using a trained classifier, the number of locations to pay attention to for more complex detection is reduced by more than half.
- Sub-windows that do not "pass" are not processed further.
Reference / Paper: Viola, P. & Jones, M. (2001). "Robust Real-time Object Detection". Second International Workshop on Statistical and Computational Theories of Vision -- Modeling, Learning, Computing, and Sampling. Vancouver, Canada, July 13, 2001. LINK
Useful summary article: LINK
# Loading the image to be tested
test_image = cv2.imread('./kaggle_celeb_images/img_align_celeba/img_align_celeba/000001.jpg')
# Converting to grayscale as opencv expects detector takes in input gray scale images
test_image_gray = cv2.cvtColor(test_image, cv2.COLOR_BGR2GRAY)
print(f'===== Test Image, {type(test_image_gray)} =====')
print(test_image_gray)
print('===== Shape =====')
print(test_image_gray.shape)
# Displaying grayscale image
plt.axis(False)
plt.imshow(test_image_gray, cmap='gray');
===== Test Image, <class 'numpy.ndarray'> ===== [[233 233 233 ... 232 241 241] [233 233 233 ... 234 241 241] [233 233 233 ... 236 241 242] ... [ 88 63 93 ... 72 73 73] [ 77 85 113 ... 66 68 68] [115 151 192 ... 66 68 68]] ===== Shape ===== (218, 178)
# getting to an integral image using cv2
int_img_cv2 = cv2.integral(test_image_gray)
display(int_img_cv2.shape, int_img_cv2)
(219, 179)
array([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 233, 466, ..., 31180, 31421, 31662],
[ 0, 466, 932, ..., 62286, 62768, 63250],
...,
[ 0, 38151, 75925, ..., 5598109, 5636022, 5674017],
[ 0, 38228, 76087, ..., 5614686, 5652667, 5690730],
[ 0, 38343, 76353, ..., 5631455, 5669504, 5707635]],
dtype=int32)
The below is a class diagram by César de Souza in Csharp when building up the framework.

Reference: https://www.codeproject.com/Articles/441226/Haar-feature-Object-Detection-in-Csharp
While some time was invested in trying to build something similar from scratch using Python, it was decidedly something out of reach for now.
Below showcases how the Viola Jones object detection framework is executed using a downloaded cascade classifier (from the OpenCV Github).
# Source: https://github.com/opencv/opencv/tree/master/data
haar_cascade_face = cv2.CascadeClassifier(
'./opencv_data/haarcascades/haarcascade_frontalface_alt2.xml'
)
def detect_faces(cascade, test_image, scaleFactor = 1.1):
# create a copy of the image to prevent any changes to the original one.
image_copy = test_image.copy()
#convert the test image to gray scale as opencv face detector expects gray images
gray_image = cv2.cvtColor(image_copy, cv2.COLOR_BGR2GRAY)
# Applying the haar classifier to detect faces
faces_rect = cascade.detectMultiScale(
gray_image,
scaleFactor=scaleFactor,
minNeighbors=5)
detection_count = 0
for (x, y, w, h) in faces_rect:
cv2.rectangle(
img=image_copy,
pt1=(x, y),
pt2=(x+w, y+h),
color=(0, 255, 0),
thickness=2)
detection_count += 1
print(f'{detection_count} faces detected')
return image_copy
faces_rects = haar_cascade_face.detectMultiScale(
test_image_gray,
scaleFactor = 1.2,
minNeighbors = 5);
# Let us print the no. of faces found
print('Faces found: ', len(faces_rects))
Faces found: 1
for (x,y,w,h) in faces_rects:
cv2.rectangle(test_image, (x, y), (x+w, y+h), (0, 255, 0), 2)
#convert image to RGB and show image
plt.imshow(convert_rgb(test_image));
#loading image
test_image2 = cv2.imread(
'./kaggle_celeb_images/img_align_celeba/img_align_celeba/000002.jpg'
)
plt.imshow(test_image2);
#call the function to detect faces
faces = detect_faces(haar_cascade_face, test_image2)
1 faces detected
#convert to RGB and display image
plt.imshow(convert_rgb(faces))
plt.axis('off');
%%time
#loading image
test_image3 = cv2.imread(
'./graphics/golden_globes_1.png'
)
#call the function to detect faces
faces = detect_faces(haar_cascade_face, test_image3)
#convert to RGB and display image
plt.figure(figsize=(16,10))
plt.imshow(convert_rgb(faces))
plt.axis('off');
7 faces detected CPU times: total: 1.02 s Wall time: 234 ms
(-0.5, 1370.5, 908.5, -0.5)
cv2.imwrite('./graphics/detected_faces.png', faces)
True
# Credit: https://github.com/nithindd/aind_computer_vision/blob/master/CV_project.ipynb
def blurface(image):
denoised_image = cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 21, 7)
gray = cv2.cvtColor(denoised_image, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('./opencv_data/haarcascades/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray, 1.1, 10)
# Make a copy of the orginal image to blur
final_image = np.copy(image)
# Blur
width = 40
kernel = np.ones((width, width),np.float32) / 1600
image_with_blur = cv2.filter2D(image, -1, kernel)
for (x,y,w,h) in faces:
padding = 30
x_start = max(x - padding, 0)
y_start = max(y - padding, 0)
x_end = min(x + w + padding, image.shape[1])
y_end = min(y + h + padding, image.shape[0])
final_image[y_start:y_end, x_start:x_end] = cv2.filter2D(image_with_blur[y_start:y_end, x_start:x_end], -1, kernel)
return final_image
plt.imshow(convert_rgb(blurface(test_image3)))
plt.axis('off');
plt.savefig('./graphics/blurred_faces.png')
Sequential CNN model with custom dropout layer
Load in data
data_dir = './data/'
img_height = 32
img_width = 32
batch_size = 20
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=42,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=42,
image_size=(img_height, img_width),
batch_size=batch_size)
train_ds_class_names = train_ds.class_names
print('train_ds Classes: ', train_ds_class_names)
val_ds_class_names = val_ds.class_names
print('val_ds Classes: ', val_ds_class_names)
Found 2000 files belonging to 2 classes. Using 1600 files for training. Found 2000 files belonging to 2 classes. Using 400 files for validation. train_ds Classes: ['lefteye', 'righteye'] val_ds Classes: ['lefteye', 'righteye']
You can create your own layers with specific characteristics by setting up a new class for that layer object, inheriting from a superclass via super().
# create monte-carlo dropout class (MCDropout)
class MCDropout(keras.layers.Dropout):
def call(self, inputs):
return super().call(inputs, training=True)
# instantiate model with layers
model_cnn_mcdropout = keras.models.Sequential(
[
keras.layers.Conv2D(filters=8,
kernel_size=3,
activation='relu',
input_shape=(32, 32, 3)),
keras.layers.Conv2D(filters=16,
kernel_size=3,
activation='relu'),
keras.layers.Flatten(input_shape=[]),
MCDropout(rate=0.2),
keras.layers.Dense(256,
activation='relu',
kernel_initializer='he_normal'),
MCDropout(rate=0.2),
keras.layers.Dense(256,
activation='relu',
kernel_initializer='he_normal'),
MCDropout(rate=0.2),
keras.layers.Dense(2,
activation='softmax'),
]
)
# compile
model_cnn_mcdropout.compile(
optimizer=keras.optimizers.Adam(0.001),
loss=keras.losses.sparse_categorical_crossentropy,
metrics=['acc'],
)
Cross Entropy helps determine how well model fits data.
It is represented by:
In neural networks, the last layer, which is usually a softmax layer, converts the output values into predicted probabilities for each possible class.
%%time
# fit
epochs = 50
history = model_cnn_mcdropout.fit(
train_ds,
epochs=epochs,
validation_data=val_ds,
verbose=1
)
Epoch 1/50 80/80 [==============================] - 3s 13ms/step - loss: 48.5973 - acc: 0.6419 - val_loss: 1.2608 - val_acc: 0.6950 Epoch 2/50 80/80 [==============================] - 1s 10ms/step - loss: 0.7460 - acc: 0.7812 - val_loss: 0.5641 - val_acc: 0.8250 Epoch 3/50 80/80 [==============================] - 1s 11ms/step - loss: 0.4024 - acc: 0.8831 - val_loss: 0.5366 - val_acc: 0.8575 Epoch 4/50 80/80 [==============================] - 1s 11ms/step - loss: 0.2746 - acc: 0.9187 - val_loss: 0.5860 - val_acc: 0.8650 Epoch 5/50 80/80 [==============================] - 1s 11ms/step - loss: 0.2291 - acc: 0.9212 - val_loss: 0.6301 - val_acc: 0.8750 Epoch 6/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1576 - acc: 0.9475 - val_loss: 0.6130 - val_acc: 0.8725 Epoch 7/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1766 - acc: 0.9444 - val_loss: 0.6402 - val_acc: 0.8325 Epoch 8/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1708 - acc: 0.9419 - val_loss: 0.6403 - val_acc: 0.8550 Epoch 9/50 80/80 [==============================] - 1s 12ms/step - loss: 0.0972 - acc: 0.9656 - val_loss: 0.6961 - val_acc: 0.8850 Epoch 10/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1123 - acc: 0.9638 - val_loss: 0.5293 - val_acc: 0.8925 Epoch 11/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0845 - acc: 0.9712 - val_loss: 0.6031 - val_acc: 0.8750 Epoch 12/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0923 - acc: 0.9700 - val_loss: 0.5084 - val_acc: 0.8750 Epoch 13/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0982 - acc: 0.9712 - val_loss: 0.4964 - val_acc: 0.8800 Epoch 14/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0696 - acc: 0.9737 - val_loss: 0.5132 - val_acc: 0.8950 Epoch 15/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0647 - acc: 0.9850 - val_loss: 0.7304 - val_acc: 0.8975 Epoch 16/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1216 - acc: 0.9688 - val_loss: 0.5898 - val_acc: 0.9025 Epoch 17/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1194 - acc: 0.9681 - val_loss: 0.5057 - val_acc: 0.9050 Epoch 18/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1174 - acc: 0.9688 - val_loss: 0.5502 - val_acc: 0.8900 Epoch 19/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1221 - acc: 0.9656 - val_loss: 0.6293 - val_acc: 0.8350 Epoch 20/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0989 - acc: 0.9731 - val_loss: 0.5575 - val_acc: 0.8875 Epoch 21/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1046 - acc: 0.9669 - val_loss: 0.7082 - val_acc: 0.8700 Epoch 22/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0844 - acc: 0.9731 - val_loss: 0.7772 - val_acc: 0.8675 Epoch 23/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0609 - acc: 0.9825 - val_loss: 0.5562 - val_acc: 0.8975 Epoch 24/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0368 - acc: 0.9887 - val_loss: 0.5517 - val_acc: 0.9175 Epoch 25/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1629 - acc: 0.9581 - val_loss: 0.6989 - val_acc: 0.8575 Epoch 26/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1525 - acc: 0.9656 - val_loss: 0.5893 - val_acc: 0.8925 Epoch 27/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0702 - acc: 0.9800 - val_loss: 0.7342 - val_acc: 0.9125 Epoch 28/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0291 - acc: 0.9900 - val_loss: 0.6845 - val_acc: 0.9075 Epoch 29/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0192 - acc: 0.9956 - val_loss: 0.5188 - val_acc: 0.9200 Epoch 30/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0122 - acc: 0.9950 - val_loss: 0.7115 - val_acc: 0.9050 Epoch 31/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0424 - acc: 0.9875 - val_loss: 0.8162 - val_acc: 0.8675 Epoch 32/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0542 - acc: 0.9850 - val_loss: 0.5119 - val_acc: 0.9275 Epoch 33/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0290 - acc: 0.9900 - val_loss: 0.6000 - val_acc: 0.9150 Epoch 34/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0637 - acc: 0.9875 - val_loss: 0.5677 - val_acc: 0.9425 Epoch 35/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0630 - acc: 0.9869 - val_loss: 0.6466 - val_acc: 0.9100 Epoch 36/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0328 - acc: 0.9912 - val_loss: 0.4655 - val_acc: 0.9175 Epoch 37/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0186 - acc: 0.9937 - val_loss: 0.6348 - val_acc: 0.9275 Epoch 38/50 80/80 [==============================] - 1s 12ms/step - loss: 0.0557 - acc: 0.9869 - val_loss: 0.6479 - val_acc: 0.9125 Epoch 39/50 80/80 [==============================] - 1s 12ms/step - loss: 0.1798 - acc: 0.9525 - val_loss: 0.5913 - val_acc: 0.8775 Epoch 40/50 80/80 [==============================] - 1s 12ms/step - loss: 0.1149 - acc: 0.9681 - val_loss: 0.8366 - val_acc: 0.8900 Epoch 41/50 80/80 [==============================] - 1s 12ms/step - loss: 0.0423 - acc: 0.9862 - val_loss: 0.8819 - val_acc: 0.9100 Epoch 42/50 80/80 [==============================] - 1s 12ms/step - loss: 0.0846 - acc: 0.9837 - val_loss: 1.0646 - val_acc: 0.8525 Epoch 43/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0318 - acc: 0.9900 - val_loss: 0.6441 - val_acc: 0.8950 Epoch 44/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0380 - acc: 0.9869 - val_loss: 1.1316 - val_acc: 0.8975 Epoch 45/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0224 - acc: 0.9937 - val_loss: 0.6188 - val_acc: 0.9025 Epoch 46/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0250 - acc: 0.9925 - val_loss: 0.4991 - val_acc: 0.9250 Epoch 47/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0172 - acc: 0.9969 - val_loss: 0.6428 - val_acc: 0.9250 Epoch 48/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0475 - acc: 0.9900 - val_loss: 1.0592 - val_acc: 0.8700 Epoch 49/50 80/80 [==============================] - 1s 11ms/step - loss: 0.0711 - acc: 0.9869 - val_loss: 1.2203 - val_acc: 0.8425 Epoch 50/50 80/80 [==============================] - 1s 11ms/step - loss: 0.1007 - acc: 0.9800 - val_loss: 0.9689 - val_acc: 0.8675 CPU times: total: 2min 41s Wall time: 49.1 s
GPU (~50s)
# save model
model_cnn_mcdropout.save('./models/cnn_mcdropout')
INFO:tensorflow:Assets written to: ./models/cnn_mcdropout\assets
history.history
{'loss': [48.59730911254883,
0.7460277676582336,
0.4023975729942322,
0.2746173143386841,
0.22910696268081665,
0.157640278339386,
0.17655536532402039,
0.17081676423549652,
0.09716537594795227,
0.11231448501348495,
0.08453074097633362,
0.09234211593866348,
0.09816806018352509,
0.06961740553379059,
0.06466124206781387,
0.12158027291297913,
0.1193663477897644,
0.11738354712724686,
0.12211340665817261,
0.09885992109775543,
0.1046118512749672,
0.08440323173999786,
0.06089307367801666,
0.03677494078874588,
0.16287995874881744,
0.1525363028049469,
0.07021334767341614,
0.02909652516245842,
0.019185511395335197,
0.012161515653133392,
0.0423540435731411,
0.054193541407585144,
0.02901754342019558,
0.0637383833527565,
0.06301436573266983,
0.032772310078144073,
0.018596326932311058,
0.05567879229784012,
0.1798142045736313,
0.11493930220603943,
0.04227212816476822,
0.08458048850297928,
0.03178039565682411,
0.0379747673869133,
0.022395286709070206,
0.025044577196240425,
0.017231645062565804,
0.04747198149561882,
0.07105182856321335,
0.10073041915893555],
'acc': [0.6418750286102295,
0.78125,
0.8831250071525574,
0.918749988079071,
0.9212499856948853,
0.9474999904632568,
0.9443749785423279,
0.9418749809265137,
0.965624988079071,
0.9637500047683716,
0.9712499976158142,
0.9700000286102295,
0.9712499976158142,
0.9737499952316284,
0.9850000143051147,
0.96875,
0.9681249856948853,
0.96875,
0.965624988079071,
0.9731249809265137,
0.9668750166893005,
0.9731249809265137,
0.9825000166893005,
0.9887499809265137,
0.9581249952316284,
0.965624988079071,
0.9800000190734863,
0.9900000095367432,
0.9956250190734863,
0.9950000047683716,
0.987500011920929,
0.9850000143051147,
0.9900000095367432,
0.987500011920929,
0.9868749976158142,
0.9912499785423279,
0.9937499761581421,
0.9868749976158142,
0.9524999856948853,
0.9681249856948853,
0.9862499833106995,
0.9837499856948853,
0.9900000095367432,
0.9868749976158142,
0.9937499761581421,
0.9925000071525574,
0.996874988079071,
0.9900000095367432,
0.9868749976158142,
0.9800000190734863],
'val_loss': [1.2607591152191162,
0.5641356110572815,
0.536637544631958,
0.5859897136688232,
0.6301290392875671,
0.6130388975143433,
0.6401651501655579,
0.6402987837791443,
0.696103572845459,
0.529287576675415,
0.6030967235565186,
0.5084162354469299,
0.49638915061950684,
0.5132113099098206,
0.7304136753082275,
0.589782178401947,
0.5057395100593567,
0.5502204298973083,
0.6292506456375122,
0.5574504733085632,
0.7082326412200928,
0.7771573662757874,
0.5561748147010803,
0.5517317056655884,
0.6989285349845886,
0.589296281337738,
0.7342371940612793,
0.6844755411148071,
0.518820583820343,
0.7115486860275269,
0.8161992430686951,
0.5118547081947327,
0.599992036819458,
0.5676887035369873,
0.646593451499939,
0.46552062034606934,
0.6347538232803345,
0.6479085683822632,
0.5912843942642212,
0.8365575671195984,
0.8819077014923096,
1.0645736455917358,
0.6441012620925903,
1.131626009941101,
0.6187908053398132,
0.4991234838962555,
0.6428235769271851,
1.0591810941696167,
1.2202801704406738,
0.9688865542411804],
'val_acc': [0.6949999928474426,
0.824999988079071,
0.8575000166893005,
0.8650000095367432,
0.875,
0.8725000023841858,
0.8324999809265137,
0.8550000190734863,
0.8849999904632568,
0.8924999833106995,
0.875,
0.875,
0.8799999952316284,
0.8949999809265137,
0.8974999785423279,
0.9024999737739563,
0.9049999713897705,
0.8899999856948853,
0.8349999785423279,
0.887499988079071,
0.8700000047683716,
0.8675000071525574,
0.8974999785423279,
0.9175000190734863,
0.8575000166893005,
0.8924999833106995,
0.9125000238418579,
0.9075000286102295,
0.9200000166893005,
0.9049999713897705,
0.8675000071525574,
0.9275000095367432,
0.9150000214576721,
0.9424999952316284,
0.9100000262260437,
0.9175000190734863,
0.9275000095367432,
0.9125000238418579,
0.8774999976158142,
0.8899999856948853,
0.9100000262260437,
0.8525000214576721,
0.8949999809265137,
0.8974999785423279,
0.9024999737739563,
0.925000011920929,
0.925000011920929,
0.8700000047683716,
0.8424999713897705,
0.8675000071525574]}
# summarize history for accuracy
plt.figure(figsize=(12,8))
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model_cnn_mcdropout accuracy', fontsize=18)
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.tight_layout()
plt.savefig('./graphics/model_cnn_acc.png')
# summarize history for loss
plt.figure(figsize=(12,8))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model_cnn_mcdropout loss', fontsize=18)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/model_cnn_loss.png')
model_cnn_mcdropout.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 8) 224 _________________________________________________________________ conv2d_1 (Conv2D) (None, 28, 28, 16) 1168 _________________________________________________________________ flatten (Flatten) (None, 12544) 0 _________________________________________________________________ mc_dropout (MCDropout) (None, 12544) 0 _________________________________________________________________ dense (Dense) (None, 256) 3211520 _________________________________________________________________ mc_dropout_1 (MCDropout) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 65792 _________________________________________________________________ mc_dropout_2 (MCDropout) (None, 256) 0 _________________________________________________________________ dense_2 (Dense) (None, 2) 514 ================================================================= Total params: 3,279,218 Trainable params: 3,279,218 Non-trainable params: 0 _________________________________________________________________
Pre-trained VGG16 model
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
# load the model
model_vgg16 = VGG16()
# load an image from file
plt.imshow(convert_rgb(cv2.imread('./data/lefteye/lefteye_000001.jpg')))
# load in version with target size set to suit VGG16
image = load_img('./data/lefteye/lefteye_000001.jpg',
target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# predict the probability across all output classes
yhat = model_vgg16.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))
shower_curtain (17.55%)
The "left" eye (or any eye) is not recognized by the pretrained model.
This is only to be expected since VGG16 was not trained to recognize facial features.
The way to resolve this is to perform transfer learning, where the VGG16 model is trained to take in additional classes.
model_vgg16.summary()
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 _________________________________________________________________
In execution of the custom model, we effectively add the pretrained base model layers at the start of the custom model (tf.keras.models.Sequential()).
What we are trying to achieve is get a well-trained model to perform feature extraction. However, we do not really care about the classes VGG16 was originally trained on (over 1000 classes from well over 10 million images).
Hence, we set:
include_top = False and
-model.trainable = False.pretrained_base = VGG16(
weights='imagenet', include_top=False, input_shape=(32,32,3)
)
pretrained_base.trainable = False
model_vgg16_custom = keras.models.Sequential(
[
pretrained_base,
# add global average pooling layer to "soften" computational intensity for model
keras.layers.GlobalAveragePooling2D(),
keras.layers.Flatten(),
keras.layers.Dense(256, activation=tf.nn.relu, name="dense1"),
keras.layers.Dense(10, activation=tf.nn.relu, name="dense2"),
keras.layers.Dense(2, activation=tf.nn.softmax)
]
)
# compile
model_vgg16_custom.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='sparse_categorical_crossentropy',
metrics=['acc']
)
%%time
# fit
epochs = 50
history_vgg16 = model_vgg16_custom.fit(train_ds,
validation_data=val_ds,
epochs=epochs,
verbose=1)
Epoch 1/50 80/80 [==============================] - 3s 21ms/step - loss: 1.4765 - acc: 0.5100 - val_loss: 0.6965 - val_acc: 0.4500 Epoch 2/50 80/80 [==============================] - 1s 17ms/step - loss: 0.6876 - acc: 0.5144 - val_loss: 0.7077 - val_acc: 0.4500 Epoch 3/50 80/80 [==============================] - 1s 17ms/step - loss: 0.6728 - acc: 0.5581 - val_loss: 0.6859 - val_acc: 0.6100 Epoch 4/50 80/80 [==============================] - 1s 17ms/step - loss: 0.6251 - acc: 0.6619 - val_loss: 0.6668 - val_acc: 0.7000 Epoch 5/50 80/80 [==============================] - 1s 17ms/step - loss: 0.6129 - acc: 0.6725 - val_loss: 0.6595 - val_acc: 0.6575 Epoch 6/50 80/80 [==============================] - 1s 17ms/step - loss: 0.5434 - acc: 0.7531 - val_loss: 0.6429 - val_acc: 0.7200 Epoch 7/50 80/80 [==============================] - 1s 16ms/step - loss: 0.4709 - acc: 0.8188 - val_loss: 0.6574 - val_acc: 0.7175 Epoch 8/50 80/80 [==============================] - 1s 17ms/step - loss: 0.4397 - acc: 0.8331 - val_loss: 0.7203 - val_acc: 0.6975 Epoch 9/50 80/80 [==============================] - 1s 17ms/step - loss: 0.3785 - acc: 0.8594 - val_loss: 0.7965 - val_acc: 0.7250 Epoch 10/50 80/80 [==============================] - 1s 16ms/step - loss: 0.4289 - acc: 0.8181 - val_loss: 0.9504 - val_acc: 0.6850 Epoch 11/50 80/80 [==============================] - 1s 17ms/step - loss: 0.3449 - acc: 0.8725 - val_loss: 0.8182 - val_acc: 0.7075 Epoch 12/50 80/80 [==============================] - 1s 17ms/step - loss: 0.3032 - acc: 0.8900 - val_loss: 0.8576 - val_acc: 0.7325 Epoch 13/50 80/80 [==============================] - 1s 17ms/step - loss: 0.2780 - acc: 0.8950 - val_loss: 0.9569 - val_acc: 0.7375 Epoch 14/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1970 - acc: 0.9275 - val_loss: 0.8951 - val_acc: 0.7325 Epoch 15/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1696 - acc: 0.9369 - val_loss: 1.0397 - val_acc: 0.7500 Epoch 16/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1957 - acc: 0.9281 - val_loss: 1.0396 - val_acc: 0.7250 Epoch 17/50 80/80 [==============================] - 1s 16ms/step - loss: 0.1819 - acc: 0.9294 - val_loss: 0.9802 - val_acc: 0.7200 Epoch 18/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1727 - acc: 0.9306 - val_loss: 1.0162 - val_acc: 0.7250 Epoch 19/50 80/80 [==============================] - 1s 16ms/step - loss: 0.1089 - acc: 0.9550 - val_loss: 1.2267 - val_acc: 0.7350 Epoch 20/50 80/80 [==============================] - 1s 16ms/step - loss: 0.1104 - acc: 0.9556 - val_loss: 1.3025 - val_acc: 0.7375 Epoch 21/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0948 - acc: 0.9712 - val_loss: 1.4328 - val_acc: 0.7200 Epoch 22/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0752 - acc: 0.9706 - val_loss: 1.6057 - val_acc: 0.6875 Epoch 23/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1452 - acc: 0.9400 - val_loss: 1.4922 - val_acc: 0.7200 Epoch 24/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1052 - acc: 0.9613 - val_loss: 1.5552 - val_acc: 0.7350 Epoch 25/50 80/80 [==============================] - 1s 17ms/step - loss: 0.1236 - acc: 0.9563 - val_loss: 1.2826 - val_acc: 0.7250 Epoch 26/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0852 - acc: 0.9625 - val_loss: 1.3487 - val_acc: 0.7450 Epoch 27/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0617 - acc: 0.9675 - val_loss: 1.4334 - val_acc: 0.7525 Epoch 28/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0389 - acc: 0.9837 - val_loss: 1.5800 - val_acc: 0.7250 Epoch 29/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0316 - acc: 0.9869 - val_loss: 1.5237 - val_acc: 0.7275 Epoch 30/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0535 - acc: 0.9769 - val_loss: 1.5089 - val_acc: 0.7450 Epoch 31/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0722 - acc: 0.9669 - val_loss: 1.2805 - val_acc: 0.6800 Epoch 32/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0637 - acc: 0.9762 - val_loss: 1.6969 - val_acc: 0.7175 Epoch 33/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0358 - acc: 0.9850 - val_loss: 1.7322 - val_acc: 0.7425 Epoch 34/50 80/80 [==============================] - 1s 16ms/step - loss: 0.1140 - acc: 0.9588 - val_loss: 1.5574 - val_acc: 0.7100 Epoch 35/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0785 - acc: 0.9706 - val_loss: 1.5992 - val_acc: 0.7350 Epoch 36/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0586 - acc: 0.9769 - val_loss: 1.5155 - val_acc: 0.7250 Epoch 37/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0379 - acc: 0.9800 - val_loss: 1.6843 - val_acc: 0.7225 Epoch 38/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0458 - acc: 0.9781 - val_loss: 1.7057 - val_acc: 0.7550 Epoch 39/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0205 - acc: 0.9894 - val_loss: 1.6682 - val_acc: 0.7325 Epoch 40/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0171 - acc: 0.9912 - val_loss: 1.6130 - val_acc: 0.7600 Epoch 41/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0132 - acc: 0.9919 - val_loss: 1.6507 - val_acc: 0.7425 Epoch 42/50 80/80 [==============================] - 1s 16ms/step - loss: 0.0123 - acc: 0.9919 - val_loss: 1.7524 - val_acc: 0.7425 Epoch 43/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0109 - acc: 0.9919 - val_loss: 1.7674 - val_acc: 0.7500 Epoch 44/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0112 - acc: 0.9919 - val_loss: 1.8062 - val_acc: 0.7450 Epoch 45/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0112 - acc: 0.9919 - val_loss: 1.8104 - val_acc: 0.7450 Epoch 46/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0100 - acc: 0.9919 - val_loss: 1.8506 - val_acc: 0.7450 Epoch 47/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0118 - acc: 0.9919 - val_loss: 1.7947 - val_acc: 0.7375 Epoch 48/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0107 - acc: 0.9919 - val_loss: 1.9970 - val_acc: 0.7250 Epoch 49/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0110 - acc: 0.9919 - val_loss: 1.9097 - val_acc: 0.7450 Epoch 50/50 80/80 [==============================] - 1s 17ms/step - loss: 0.0356 - acc: 0.9856 - val_loss: 2.0097 - val_acc: 0.7375 CPU times: total: 2min 55s Wall time: 1min 9s
CPU (5min); GPU (1min)
model_vgg16_custom.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Functional) (None, 1, 1, 512) 14714688 _________________________________________________________________ global_average_pooling2d (Gl (None, 512) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 512) 0 _________________________________________________________________ dense1 (Dense) (None, 256) 131328 _________________________________________________________________ dense2 (Dense) (None, 10) 2570 _________________________________________________________________ dense_3 (Dense) (None, 2) 22 ================================================================= Total params: 14,848,608 Trainable params: 133,920 Non-trainable params: 14,714,688 _________________________________________________________________
model_vgg16_custom.save('./models/vgg_16_custom')
INFO:tensorflow:Assets written to: ./models/vgg_16_custom\assets
# summarize history for accuracy
plt.figure(figsize=(16,10))
plt.plot(history_vgg16.history['acc'])
plt.plot(history_vgg16.history['val_acc'])
plt.title('model_vgg16_custom accuracy', fontsize=18)
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/vgg_16_custom_acc.png')
# summarize history for loss
plt.figure(figsize=(16,10))
plt.plot(history_vgg16.history['loss'])
plt.plot(history_vgg16.history['val_loss'])
plt.title('model_vgg16_custom loss', fontsize=18)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left', fontsize=12)
plt.savefig('./graphics/vgg_16_custom_loss.png')
We see diverging loss plots, indicative of some issues in the modelling.
We are able to stitch together many possibilities for experimentation based on the Kaggle dataset.
Next time, we may wish to build a classifier for whether a person is smiling or not based on their eye features.
Data in the attributes table includes whether the person in the image is smiling (1) or not (-1).
# Create smile dataframe
df_smile = df_celeb_attributes[['image_id', 'Smiling']].truncate(after=999)
# Lower-case columns
df_smile.columns = map(str.lower, df_smile.columns)
# Check df & truncation
df_smile.tail(3)
| image_id | smiling | |
|---|---|---|
| 997 | 000998.jpg | -1 |
| 998 | 000999.jpg | -1 |
| 999 | 001000.jpg | 1 |
Show images with smiling classes (1 for smiling, -1 for not smiling)
display_count = 15
ncols_display = 5
plt.figure(figsize=(16,12))
for i in range(1, display_count + 1):
smile_class = df_smile['smiling'].loc[i - 1]
ax = plt.subplot(int(np.ceil(display_count / ncols_display)), ncols_display, i)
ax.imshow(convert_rgb(cv2.imread(f'./kaggle_celeb_images/img_align_celeba/img_align_celeba/{str(i).zfill(6)}.jpg')))
ax.set_title(f'Image {i}, Smile : {smile_class}')
ax.axis('off')
plt.suptitle(f'Faces: First {display_count} images', y=1, fontsize=18);
plt.tight_layout(pad=3)
# plt.savefig('./graphics/face_15.png')

Other links:
tf.keras.layers.Conv2D(
filters, kernel_size, strides=(1, 1), padding='valid',
data_format=None, dilation_rate=(1, 1), groups=1, activation=None,
use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
bias_constraint=None, **kwargs
)
A shout out to thank my instructional team at General Assembly, as well as the team at large for facilitating my learning journey. It's been fun learning with my coursemates and the GA community globally; thanks for the good times! Also, Josh Starmer at StatQuest has been a boon to society. :)
Thanks also to IMDA for their steadfast commitment to lifelong learning of digital skills and sponsorship of programmes like the Tech Immersion and Placement Programme (TIPP).
Special thanks to old buddies and new friends I've made in the process of reaching out via professional networks / through social circles. If you're reading this, you know who you are! :)
:bowtie: :beer: :pizza: :sparkling_heart: :muscle: :clap: :tada: